Practical Lineage Tracing in Data Warehouses
نویسندگان
چکیده
We consider the view data lineage problem in a warehousing environment For a given data item in a materialized warehouse view we want to identify the set of source data items that produced the view item We formalize the problem and we present a lineage tracing algorithm for relational views with aggregation Based on our tracing algorithm we propose a number of schemes for storing auxiliary views that enable consistent and e cient lineage tracing in a multi source data warehouse We report on a performance study of the various schemes identifying which schemes perform best in which settings Based on our results we have implemented a lineage tracing package in the WHIPS data warehousing system prototype at Stanford With this package users can select view tuples of interest then e ciently drill through to examine the exact source tuples that produced the view tuples of interest
منابع مشابه
Investigating a heterogeneous data integration approach for data warehousing
Data warehouses integrate data from remote, heterogeneous, autonomous data sources into a materialised central database. The heterogeneity of these data sources has two aspects, data expressed in different data models, called model heterogeneity, and data expressed within different schemas of the same data model, called schema heterogeneity. AutoMed is an approach to heterogeneous data transfor...
متن کاملResearch Problems in Data Provenance
The problem of tracing the provenance (also known as lineage) of data is an ubiquitous problem that is frequently encountered in databases that are the result of many transformation steps. Scientific databases and data warehouses are some examples of such databases. However, contributions from the database research community towards this problem have been somewhat limited. In this paper, we mot...
متن کاملLineage Tracing in a Data Warehousing System
A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. Figure 1 shows the basic architecture of a data warehousing system. In many cases, the warehouse view contents alone are not su cient for in-depth analysis. It is often...
متن کاملLineage Tracing in a Data Warehousing System Demonstration Proposal
A data warehousing system collects data from multiple distributed sources and stores the inte grated information as materialized views in a local data warehouse Users then perform data analysis and mining on the warehouse views Figure shows the basic architecture of a data warehousing system In many cases the warehouse view contents alone are not su cient for in depth analysis It is often usefu...
متن کاملData Lineage: A Survey
Lineage, or provenance, in its most general form describes where data came from, how it was derived, and how it was updated over time. Information management systems today exploit lineage in tasks ranging from data verification in curated databases [1] to confidence computation in probabilistic databases [10, 12]. Here, we formalize and categorize lineage, discuss a set of selected papers, and ...
متن کامل